Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] "Overload" cython consume methods with type introspection #1787

Merged
merged 8 commits into from
Sep 17, 2017

Conversation

standage
Copy link
Member

@standage standage commented Sep 14, 2017

Even though the hashtable "consume_seqfile" methods are overloaded at the C++ level, for the most part only the variants that allow filename input have been exposed to the Python layer. This PR uses type introspection to dynamically determine whether the input is a parser object or a string to be interpreted as a filename, and expose both functionalities to Python.

Kudos to @camillescott for demonstrating this approach in #1774. I'm splitting this out into a separate PR to simplify code review and to expedite progress.

  • Is it mergeable?
  • make test Did it pass the tests?
  • make clean diff-cover If it introduces new functionality in
    scripts/ is it tested?
  • make format diff_pylint_report cppcheck doc pydocstyle Is it well
    formatted?
  • Did it change the command-line interface? Only backwards-compatible
    additions are allowed without a major version increment. Changing file
    formats also requires a major version number increment.
  • For substantial changes or changes to the command-line interface, is it
    documented in CHANGELOG.md? See keepachangelog
    for more details.
  • Was a spellchecker run on the source code and documentation after
    changes were made?
  • Do the changes respect streaming IO? (Are they
    tested for streaming IO?)

@standage
Copy link
Member Author

On my laptop, 59 tests are failing, many of which are complaining about the _get_parser method, raising a TypeError: Requires a string-like sequence. I can't tell if this was already a problem in #1774, or if I introduced the problem by being to conservative in what I migrated over to this PR. It's difficult to tell since #1774 hasn't had a clean build yet.

@standage standage changed the title "Overload" cython consume methods with type introspection [WIP] "Overload" cython consume methods with type introspection Sep 14, 2017
@camillescott
Copy link
Member

Looks like you need to bring in the changes to _bstring in khmer/_oxli/utils.pyx

@camillescott
Copy link
Member

Ah wait, my bad -- it's because there isn't any logic to accept ReadParser objects. It relies on the various scripts and tests switching to the new Cython FastxParser. I suppose you could add another conditional to _get_parser to check for type ReadParser and pull out its pointer in the meantime?

if type(parser_or_filename) is FastxParser:
_parser = (<FastxParser>parser_or_filename)._this
elif type(parser_or_filename) is ReadParser:
_parser = # Unholy incantation
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried several things but I can't figure out the right incantation to go here. If you have a few minutes tomorrow to pair program this tomorrow @camillescott that would be wonderful. Unless it's a trivial fix, in which case a comment here should be fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_parser = (<CPyReadParser_Object>parser_or_filename).parser ought to do it, along with a from khmer._oxli.parsing cimport CPyReadParser_Object up top

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Now clang is choking on the Cythonized C++ code. It looks like cython is using C++17 features. 😒

...
...
...
khmer/_oxli/graphs.cpp:16623:9: warning: use of the 'fallthrough' attribute is a C++1z extension [-Wc++1z-extensions]
        CYTHON_FALLTHROUGH;
        ^
khmer/_oxli/graphs.cpp:481:36: note: expanded from macro 'CYTHON_FALLTHROUGH'
      #define CYTHON_FALLTHROUGH [[fallthrough]]
                                   ^
khmer/_oxli/graphs.cpp:16636:9: warning: use of the 'fallthrough' attribute is a C++1z extension [-Wc++1z-extensions]
        CYTHON_FALLTHROUGH;
        ^
khmer/_oxli/graphs.cpp:481:36: note: expanded from macro 'CYTHON_FALLTHROUGH'
      #define CYTHON_FALLTHROUGH [[fallthrough]]
                                   ^
168 warnings and 2 errors generated.
error: command 'clang' failed with exit status 1
make: *** [test] Error 1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I didn't see the trees for the forest. These warnings swamped the real issues.

...
...
...
khmer/_oxli/graphs.cpp:5723:43: error: member reference type 'PyObject *' (aka '_object *') is a pointer; did you mean to use '->'?
    __pyx_t_5 = __pyx_v_parser_or_filename.parser;                                                                                            
                ~~~~~~~~~~~~~~~~~~~~~~~~~~^                                                                          
                                          ->
khmer/_oxli/graphs.cpp:5723:44: error: no member named 'parser' in '_object'
    __pyx_t_5 = __pyx_v_parser_or_filename.parser;
                ~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
...
...
...

This was my experience last night. I couldn't figure out how to handle objects vs pointers etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just needed to add a * to _parser = (<CPyReadParser_Object>parser_or_filename).parser to make it _parser = (<CPyReadParser_Object*>parser_or_filename).parser. w00t!

deref(self._hg_this).\
consume_seqfile_and_tag_readparser[CpFastxReader](_parser,
total_reads,
n_consumed)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick pair programming session with @camillescott revealed I had failed to move these changes over from feature/cython_cleanup along with the others. This explained the latest set of failing tests.

@standage standage changed the title [WIP] "Overload" cython consume methods with type introspection [MRG] "Overload" cython consume methods with type introspection Sep 14, 2017
@standage
Copy link
Member Author

This PR is ready for review and merge! @ctb @luizirber @betatim

@codecov-io
Copy link

codecov-io commented Sep 15, 2017

Codecov Report

Merging #1787 into master will not change coverage.
The diff coverage is 0%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #1787   +/-   ##
======================================
  Coverage    0.05%   0.05%           
======================================
  Files          78      78           
  Lines        9757    9757           
  Branches     2457    2457           
======================================
  Hits            5       5           
  Misses       9752    9752
Impacted Files Coverage Δ
scripts/filter-abund-single.py 0% <ø> (ø) ⬆️
scripts/load-into-counting.py 0% <ø> (ø) ⬆️
oxli/functions.py 0% <0%> (ø) ⬆️
scripts/abundance-dist-single.py 0% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ae3c90e...f0f61cd. Read the comment docs.

Copy link
Member

@camillescott camillescott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that the with_reads_parser variants didn't get called more in the scripts... is sandbox covered? Will wait to merge until tomorrow.

_parser = (<FastxParser>parser_or_filename)._this
elif isinstance(parser_or_filename, ReadParser):
_parser = (<CPyReadParser_Object*>parser_or_filename).parser
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking -- there's a function in utils called is_str, which just checks for, well, strness. It might be better to do another elif with that here, and then raise a TypeError on the else. Otherwise, any TypeError is going to be raised by _bstring, which will be confusing to the user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thought. Updated!

@camillescott camillescott merged commit 01e826b into master Sep 17, 2017
@camillescott camillescott deleted the fix/consume-filename-or-parser branch September 19, 2017 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants